Coursework 2: Generative Models

Instructions

 Submission

Please submit one zip file on cate - CW2.zip containing the following:

  1. A version of this notebook containing your answers. Write your answers in the cells below each question. Please deliver the notebook including the outputs of the cells
  2. Your trained VAE model as VAE_model.pth
  3. Your trained Generator and Discriminator: DCGAN_model_D.pth and DCGAN_model_G.pth

Training

Training the GAN will take quite a long time (multiple hours), and so you have four options:

  1. Use PaperSpace as suggested for CW1
  2. Use Lab GPUs via SSH. The VSCode Remote Develop extension is recommended for this. For general Imperial remote working instructions see this post. You'll also want to setup your environment as outlined here.
  3. Use Colab and add checkpointing to the model training code; this is to handle the case where colab stops a free-GPU kernel after a certain number of hours (~4).
  4. Use Colab Pro - If you do not wish to use PaperSpace then you can pay for Colab Pro. We cannot pay for this on your behalf (this is Google's fault).

 Testing

TAs will run a testing cell (at the end of this notebook), so you are required to copy your data transform and denorm functions to a cell near the bottom of the document (it is demarkated). You are advised to check that your implementations pass these tests (in particular, the jit saving and loading may not work for certain niche functions)

General

You can feel free to add architectural alterations / custom functions outside of pre-defined code blocks, but if you manipulate the model's inputs in some way, please include the same code in the TA test cell, so our tests will run easily.

**The deadline for submission is 19:00, Friday 25th February, 2022**

Setting up working environment

You will need to install pytorch and import some utilities by running the following cell:

Here we have some default pathing options which vary depending on the environment you are using. You can of course change these as you please.

Introduction

For this coursework, you are asked to implement two commonly used generative models:

  1. A Variational Autoencoder (VAE)
  2. A Deep Convolutional Generative Adversarial Network (DCGAN)

For the first part you will the MNIST dataset https://en.wikipedia.org/wiki/MNIST_database and for the second the CIFAR-10 (https://www.cs.toronto.edu/~kriz/cifar.html).

Each part is worth 50 points.

The emphasis of both parts lies in understanding how the models behave and learn, however, some points will be available for getting good results with your GAN (though you should not spend too long on this).

Part 1 - Variational Autoencoder

Part 1.1 (25 points)

Your Task:

a. Implement the VAE architecture with accompanying hyperparameters. More marks are awarded for using a Convolutional Encoder and Decoder.

b. Design an appropriate loss function and train the model.


Part 1.1a: Implement VAE (25 Points)

Hyper-parameter selection

Data loading

Model Definition

Fig.1 - VAE Diagram (with a Guassian prior), taken from 1.

You will need to define:

Hints:


Part 1.1b: Training the Model (5 Points)

Defining a Loss

Recall the Beta VAE loss, with an encoder $q$ and decoder $p$: $$ \mathcal{L}=\mathbb{E}_{q_\phi(z \mid X)}[\log p_\theta(X \mid z)]-\beta D_{K L}[q_\phi(z \mid X) \| p_\theta(z)]$$

In order to implement this loss you will need to think carefully about your model's outputs and the choice of prior.

There are multiple accepted solutions. Explain your design choices based on the assumptions you make regarding the distribution of your data.

Loss Explanation

Explain your choice of loss and how this relates to:

YOUR ANSWER

The loss contains two terms, the first one is the BCE (reconstruction term) between the reconstructed data and the data and the second one is the KLD (regularisation term) which see how far the latent space distribution is from the VAE prior. The second term is here to make sure that the latent space is well organized. This will avoid overfitting and will give two main properties to the VAE which are the continuity (two samples in the latent should give close decoded outputs), the completeness (every sample from the latent distribution will return sort of meaningful content) and a distangled latent space (one z sample can only be decoded into 1 unique reconstructed image.)

Part 1.2 (9 points)

a. Plot your loss curves

b. Show reconstructions and samples

c. Discuss your results from parts (a) and (b)

Part 1.2a: Loss Curves (3 Points)

Plot your loss curves (6 in total, 3 for the training set and 3 for the test set): total loss, reconstruction log likelihood loss, KL loss (x-axis: epochs, y-axis: loss). If you experimented with different values of $\beta$, you may wish to display multiple plots (worth 1 point).

Loss curves

Loss curves with beta variations

YOUR ANSWER

We can see that the higher the beta the better the regularization, bus as the KL value is really low compared to the BCE value, it doesn't have a huge impact. One solution to cope with this could be to normalize the BCE to have both values in the same order of magnitude.

Part 1.2b: Samples and Reconstructions (6 Points)

Visualize a subset of the images of the test set and their reconstructions as well as a few generated samples. Most of the code for this part is provided. You only need to call the forward pass of the model for the given inputs (might vary depending on your implementation).

For reference, here's some samples from our VAE.

Discussion

Provide a brief analysis of your loss curves and reconstructions:

YOUR ANSWER


Part 1.3 (11 points)

Qualitative analysis of the learned representations

In this question you are asked to qualitatively assess the representations that your model has learned. In particular:

a. Dimensionality Reduction of learned embeddings

b. Interpolating in the latent space

Part 1.3a: T-SNE on Embeddings (7 Points)

Extract the latent representations of the test set and visualize them using T-SNE (see implementation). You can use a T-SNE implementation from a library such as scikit-learn.

We've provided a function to visualize a subset of the data, but you are encouraged to also produce a matplotlib plot (please use different colours for each digit class).

Discussion

What do you observe? Discuss the structure of the visualized representations.

Note - If you created multiple plots and want to include them in your discussion, the best option is to upload them to (e.g.) google drive and then embed them via a public share link. If you reference local files, please include these in your submission zip, and use relative pathing if you are embedding them (with the notebook in the base directory)

YOUR ANSWER

We can observe that there are still few outliers in all the clusters. Actually the KL term should have helped to encode them correctly, but if we look closer to these outliers we can clearly see that even a human would have difficulties to read them. Thus it can either be wrongly labeled ones or really ambiguous ones. Moreover, we can see that the higher the perplexity, the more the clusters are separated from each other. However in the plot with perplexity=5, we can already identify the boundaries. So there's no need to increase the perplexity more ase it also shows the completeness and continuity of the VAE model we trained. Last but not least it's important to keep in mind that we are projecting vectors from 8 to 2 dimensions, so there's a lot of different configurations but it doesn't represent how the data point are organized in 8 dimensions which is what we need to fully understand the interpolation right after. Finally I would say that the t-SNE is a good tool as we can only plot lower than 3D dim and it looks quite reliable.

Part 1.3b: Interpolating in $z$ (4 Points)

Perform a linear interpolation in the latent space of the autoencoder by choosing any two digits from the test set. What do you observe regarding the transition from on digit to the other?

Discussion

What did you observe in the interpolation? Is this what you expected?

YOUR ANSWER In my interpolation, I observe that the 3 is actually somehow in between the 5 and the 2 in the latent space (3 dimensions). In the first steps the upper right corner of the 5 is being rounded which makes it look like an 3. Then it starts to brak the lines on the upper left and the lower right corners. It show how continuous is the method. Concerning the relation with the T-SNE visualization, it appears that sometimes the t-SNE representation locates the 3 in between 5 and 2 or close from each other which is the case up there. But as it is in 2 dimensions, we cannot always see if the 3 would be in between the 5 and 2 in latent_dim dimensions.

Part 2 - Deep Convolutional GAN

In this task, your main objective is to train a DCGAN (https://arxiv.org/abs/1511.06434) on the CIFAR-10 dataset. You should experiment with different architectures and tricks for stability in training (such as using different activation functions, batch normalization, different values for the hyper-parameters, etc.). In the end, you should provide us with:

Part 2.1 (30 points)

Your Task:

a. Implement the DCGAN architecture.

b. Define a loss and implement the Training Loop

c. Visualize images sampled from your best model's generator ("Extension" Assessed on quality)

d. Discuss the experimentations which led to your final architecture. You can plot losses or generated results by other architectures that you tested to back your arguments (but this is not necessary to get full marks).

Clarification: You should not be worrying too much about getting an "optimal" performance on your trained GAN. We want you to demonstrate to us that you experimented with different types of DCGAN variations, report what difficulties transpired throughout the training process, etc. In other words, if we see that you provided us with a running implementation, that you detail different experimentations that you did before providing us with your best one, and that you have grapsed the concepts, you can still get good marks. The attached model does not have to be perfect, and the extension marks for performance are only worth 10 points.

Part 2.1a: Implement DCGAN (8 Points)

Fill in the missing parts in the cells below in order to complete the Generator and Discriminator classes. You will need to define:

Recomendations for experimentation:

Some general reccomendations:

Try to follow the common practices for CNNs (e.g small kernels, max pooling, RELU activations), in order to narrow down your possible choices.

**Your model should not have more than 25 Million Parameters**

The number of epochs that will be needed in order to train the network will vary depending on your choices. As an advice, we recommend that while experimenting you should allow around 20 epochs and if the loss doesn't sufficiently drop, restart the training with a more powerful architecture. You don't need to train the network to an extreme if you don't have the time.

Data loading

We'll visualize a subset of the test set:

Model Definition

Define hyperparameters and the model

TOY Conv layers for Generator
TOY Conv layers for Discriminator

Initialize Model and print number of parameters

You can use method weights_init to initialize the weights of the Generator and Discriminator networks. Otherwise, implement your own initialization, or do not use at all. You will not be penalized for not using initialization.

Part 2.1b: Training the Model (12 Points)

Defining a Loss

Choose and initialize optimizers

Define fixed input vectors to monitor training and mode collapse.

Training Loop

Complete the training loop below. We've defined some variables to keep track of things during training:

Part 2.1c: Results (10 Points)

This part is fairly open-ended, but not worth too much so do not go crazy. The table below shows examples of what are considered good samples. Level 3 and above will get you 10/10 points, level 2 will roughly get you 5/10 points and level 1 and below will get you 0/10 points.

Forwarding
Level 1

Routing
Level 2

Routing
Level 3

Generator samples

Part 2.1d: Engineering Choices (10 Points)

Discuss the process you took to arrive at your final architecture. This should include:

Your Answer

Part 2.2: Understanding GAN Training (5 points)

Loss Curves

Your task:

Plot the losses curves for the discriminator $D$ and the generator $G$ as the training progresses and explain whether the produced curves are theoretically sensible and why this is (or not) the case (x-axis: epochs, y-axis: loss).

Make sure that the version of the notebook you deliver includes these results.

Discussion

Do your loss curves look sensible? What would you expect to see and why?

YOUR ANSWER

Part 2.3: Understanding Mode Collapse (5 points)

Your task:

Describe the what causes the phenomenon of Mode Collapse and how it may manifest in the samples from a GAN.

Based on the images created by your generator using the fixed_noise vector during training, did you notice any mode collapse? what this behaviour may be attributed to, and what did you try to eliminate / reduce it?

Discussion

YOUR ANSWER

A mode collapse occurs when a generator model is only able to generate one or a few different outputs.

At the very beginning I was stuck in mode collapse for few days especially because the generator was only producing the same image, which is set of random pixel all over the square. The loss was oscillated and never learned actual representations. Then I succeed in escaping from mode collapse when I added the label smoothing and the gaussian noise to the inputs.

TA Test Cell

TAs will run this cell to ensure that your results are reproducible, and that your models have been defined suitably.

Please provide the input and output transformations required to make your VAE and GANs work. If your GAN generator requires more than just noise as input, also specify this below (there are two marked cells for you to inspect)